Identification of Critical Factors in Checkpointing Based Multiple Fault Tolerance for Distributed System

نویسندگان

  • Sanjay Bansal
  • Sanjeev Sharma
چکیده

Performance of a checkpointing based multiple fault tolerance is low. The main reason is overheads associate with checkpointing. A checkpointing algorithm can be improved by improved storing strategy and checkpointing scheduling. Improved storage strategy and checkpointing scheduling will reduce the overheads associated with checkpointing. Performance and efficiency is most desirable feature of recovery based on checkpointing. In this paper important critical issues involved in fast and efficient recovery are discussed based on checkpointing. Impact of each issue on performance of checkpointing based recovery is also discussed. Relationships among issues are also explored. Finally comparisons of important issues are done between coordinated checkpointing and uncoordinated checkpointing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stability Assessment Metamorphic Approach (SAMA) for Effective Scheduling based on Fault Tolerance in Computational Grid

Grid Computing allows coordinated and controlled resource sharing and problem solving in multi-institutional, dynamic virtual organizations. Moreover, fault tolerance and task scheduling is an important issue for large scale computational grid because of its unreliable nature of grid resources. Commonly exploited techniques to realize fault tolerance is periodic Checkpointing that periodically ...

متن کامل

An Enhanced MSS-based checkpointing Scheme for Mobile Computing Environment

Mobile computing systems are made up of different components among which Mobile Support Stations (MSSs) play a key role. This paper proposes an efficient MSS-based non-blocking coordinated checkpointing scheme for mobile computing environment. In the scheme suggested nearly all aspects of checkpointing and their related overheads are forwarded to the MSSs and as a result the workload of Mobile ...

متن کامل

A Novel adaptive Checkpointing method based on Information obtained from Workflow Structure

Scientific workflows are dataand compute-intensive; thus, they may run for days or even weeks on parallel and distributed infrastructures such as grids, supercomputers, and clouds. In these high-performance computing infrastructures, the number of failures that can arise during scientific-workflow enactment can be high, so the use of fault-tolerance techniques is unavoidable. The most-frequentl...

متن کامل

Minimum-Process Synchronous Checkpointing in Mobile Distributed Systems

Checkpointing is an efficient fault tolerance technique used in distributed systems. Due to the emerging challenges of the mobile distributed system as low bandwidth, mobility, lack of stable storage, frequent disconnections and limited battery life, the fault tolerance technique designed for distributed system can not directly implemented on mobile distributed systems(MDSs). This research pape...

متن کامل

Performance and effectiveness trade-off for checkpointing in fault-tolerant distributed systems

Checkpointing has a crucial impact on systems' performance and fault tolerance effectiveness: excessive checkpointing results in performance degradation, while deficient checkpointing incurs expensive recovery. In distributed systems with independent checkpoint activities there is no easy way to determine checkpoint frequencies optimizing response time and fault tolerance costs at the same time...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011